Day 2 Session 2:
Dynamic spectral analysis
using Generalised Additive Mixed-Effect Models (GAMMs)
Introduction
In the previous sesseion, we tried running a statistical analysis using linear mixed-effect modelling. It was a static analysis based on a single point measurement of formant frequencies to characterise differences of the English /l/ and /ɹ/ quality produced by L1 Japanese and L1 English speakers.
While we observed some key between-group differences in the static data, the static analysis did not allow us to answer why such differences would occur. Specifically, it was not very unclear whether it is just the height of the formant frequencies that are different, or the formant frequency difference is just a small part of a broader, more fundamental difference.
More crucially, analysing the liquid consonants based on the static analysis lacks the consideration that English liquid consonants are inherently dynamic. This means that their spectral properties vary as a function of time, and a single-point measurement of formant frequencies is not adequate in characterising the English liquid quality. Also, English liquids show a strong interaction with the neighbouring vowels, so we need to consider how English liquids are coarticulated with the following vowel.
Taken together, we need to take the temporal dimension into account when analysing English liquids. The possible liquid-vowel interaction may mean that L1 Japanese speakers may produce the sequence of liquid consonant and a vowel with a different interaction pattern from that of L1 English speakers.
In this session, therefore, let’s try modelling the whole formant contours directly using Generalised Additive Mixed-Efefct Models (GAMMs).
GAMMs: Roadmap
Here is the brief road map in the GAMMs section.
Generalised Additive Mixed-Effect Models (GAMMs)
Preliminaries
Installing/loading packages
As always, let’s first install and load R packages that we are using in the static analysis section. The installation commands have been commented out, but please uncomment them and install the packages if you do not have them on your machine yet.
# installing packages
# install.packages("tidyverse")
# install.packages("mgcv")
# install.packages("itsadug")
# install.packages("tidymv")
# install.packages("tidygam")
# importing packages
library(tidyverse)
library(mgcv)
library(itsadug)
# library(tidymv) # for 'get_gam_predictions'
library(tidygam) # for 'get_gam_predictions'
source("https://raw.githubusercontent.com/soskuthy/gamm_intro/master/gamm_hacks.r")Data wrangling
Check data
We always start with inspecting the data set using
colnames().
## [1] "...1" "file" "speaker" "language"
## [5] "f0" "duration" "segment" "previous_sound"
## [9] "next_sound" "word" "f1" "f2"
## [13] "f3" "time" "IsApproximant" "IsAcoustic"
## [17] "note" "gender" "omit" "Barkf1"
## [21] "Barkf2" "Barkf3" "f2f1" "f3f2"
## [25] "Barkf2f1" "Barkf3f2" "position" "context"
## [29] "liquid" "input_file"
Omitting irrelavent columns
We’ll omit the columns we don’t need.
# Let's check the number of "approximant" tokens
df_dyn |>
dplyr::group_by(IsApproximant) |>
dplyr::summarise() |>
dplyr::ungroup()## # A tibble: 1 × 1
## IsApproximant
## <chr>
## 1 yes
# Let's check the number of tokens of good recording quality
df_dyn |>
dplyr::group_by(IsAcoustic) |>
dplyr::summarise() |>
dplyr::ungroup()## # A tibble: 1 × 1
## IsAcoustic
## <chr>
## 1 yes
## # A tibble: 1 × 1
## omit
## <dbl>
## 1 0
# Remove columns that we no longer need
df_dyn <- df_dyn |>
dplyr::select(-c(IsApproximant, IsAcoustic, omit, Barkf1, Barkf2, Barkf3, Barkf2f1, Barkf3f2, f2f1, f3f2))Let’s check the column names again.
## [1] "...1" "file" "speaker" "language"
## [5] "f0" "duration" "segment" "previous_sound"
## [9] "next_sound" "word" "f1" "f2"
## [13] "f3" "time" "note" "gender"
## [17] "position" "context" "liquid" "input_file"
Let’s also convert the context column into IPA symbols
for a more intuitive representation:
Checking the number of participants, tokens…
Let’s also obtain some descriptive statistics here. Note that we need to divide the number of rows by 11 to obtain the accurate number of tokens, as one token now has 11 time points.
# number of participants
df_dyn |>
dplyr::group_by(language) |>
dplyr::summarise(n = n_distinct(speaker)) |>
dplyr::ungroup()## # A tibble: 2 × 2
## language n
## <chr> <int>
## 1 English 14
## 2 Japanese 31
# number of tokens per segment
df_dyn |>
dplyr::group_by(segment) |>
dplyr::summarise(n = n()/11) |> # divide by 11 time points
dplyr::ungroup()## # A tibble: 6 × 2
## segment n
## <chr> <dbl>
## 1 LAE1 511
## 2 LIY1 408
## 3 LUW1 299
## 4 RAE1 502
## 5 RIY1 481
## 6 RUW1 314
Data visualisation
Scaling formant frequencies
Do you remember how to visualise the dynamic data? The basic procedure is the same as in the static analysis; We first apply z-score normalisation to the formant frequencies to make sure that formant values are comparable across speakers.
df_dyn <- df_dyn |>
dplyr::group_by(speaker) |> # tell R to do the following iteration per speaker
dplyr::mutate(
f1z = scale(as.numeric(f1)), # scale f1 into z-score
f2z = scale(as.numeric(f2)), # scale f2 into z-score
f3z = scale(as.numeric(f3)) # scale f3 into z-score
) |>
dplyr::ungroup() # don't forget ungroupingDescriptive statistics
Let’s check the mean and SD for both raw and normalised formant values: just see F1 for now. Note that the mean z-scores do not seem to look zero, but this is because computers are not very good at dealing with very small numbers (e.g., decimals) and some fluctuations occur in computing the values.
# check mean and sd of raw/scaled F1 values for each speaker
df_dyn |>
dplyr::group_by(speaker) |>
dplyr::summarise(
f1_mean = mean(f1),
f1_sd = sd(f1),
f1z_mean = mean(f1z),
f1z_sd = sd(f1z)
) |>
dplyr::ungroup() ## # A tibble: 45 × 5
## speaker f1_mean f1_sd f1z_mean f1z_sd
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2d57ke 468. 140. -9.05e-17 1
## 2 2drb3c 575. 243. 2.39e-16 1
## 3 2zy9tf 459. 228. -2.41e-16 1
## 4 3bcpyh 438. 142. -5.72e-18 1.00
## 5 3hsubn 537. 177. 6.11e-17 1
## 6 3pzrts 458. 195. -1.44e-18 1.00
## 7 4ps8zx 467. 172. 2.19e-16 1
## 8 54i2ks 453. 192. 1.43e-16 1.00
## 9 5jzj2h 412. 133. 2.49e-16 1.00
## 10 5upwe3 444. 189. -6.51e-17 1.00
## # ℹ 35 more rows
Visualisation
raw trajectories
Let’s visualise the formant trajectories here. In the dynamic analysis, it is almost always the case that we take the time dimension on the x-axis and the dependent variable on the y-axis. This allows us to see how e.g., F1 changes over time.
We also make sure about the variable grouping to
tell ggplot2 how to organise the data. This can be done via
group argument in the geom function. Each
formant trajectory should come from one audio file, which is stored in
the file column. We use this information for grouping.
# F1 - raw trajectories
df_dyn |>
ggplot(aes(x = time, y = f1z)) +
geom_point(aes(colour = language, group = file), width = 0.3, alpha = 0.4) +
geom_path(aes(colour = language, group = file), width = 0.3, alpha = 0.4) +
geom_hline(yintercept = 0, linetype = "dashed", linewidth = 0.5) +
scale_colour_manual(values = cbPalette) +
facet_grid(liquid ~ context) +
labs(x = "time", y = "F1 (z-normalised)", title = "time-varying change in F1 frequency") +
theme(strip.text.y = element_text(angle = 0))smooths
While I usually prefer just plotting raw trajectories because it is faithful to the nature of the data, I must admit that it is sometimes very difficult to see what’s going on there.
If you prefer, we could also just plot smooths to
highlight the nonlinear between-group difference. The code below adds
smoothed F1z trajectories to the raw data we just plotted (but I have
commented out the raw trajectories for now). Note the difference in
grouping; we used the file variable for the raw
trajectories, but for smooths we need to use the language
variable because we would like one smoothed trajectory for each L1
group.
# F1 - smooths
df_dyn |>
ggplot(aes(x = time, y = f1z)) +
# geom_point(aes(colour = language, group = file), width = 0.3, alpha = 0.1) +
# geom_path(aes(colour = language, group = file), width = 0.3, alpha = 0.1) +
geom_smooth(aes(colour = language, group = language), linewidth = 1.2, se = TRUE) +
geom_hline(yintercept = 0, linetype = "dashed", linewidth = 0.5) +
scale_colour_manual(values = cbPalette) +
facet_grid(liquid ~ context) +
labs(x = "time", y = "F1 (z-normalised)", title = "smoothed time-varying change in F1 frequency") +
theme(strip.text.y = element_text(angle = 0))## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
From linear to non-linear modelling: Introducting Generalised Additive Mixed Effects Models (GAMMs)
Modelling non-linearity in the data
Here, we would like to model the formant trajectory directory without relying just on a straight line. This can be illustrated by the plot below that shows the time-varying transition of F2 signals (in Hz) for the word reap produced by an L1 Japanese speaker.
As you can see in the plot below, the linear regression line (in yellow) does not capture the overall trend of the formant trajectory anymore. What we would ideally like is the non-linear curve in skyblue.
One solution to the non-linear modelling is via Generalised Additive (Mixed Effects) Models: GA(M)Ms. While GA(M)Ms are similar with the linear models in that they both can capture linear trends with parametric terms, GA(M)Ms also incorporate smooth terms that can capture non-linear relationships between the variables. Put it simply, GA(M)Ms is a combination (i.e., addition) of multiple (smooth) functions, which can be notated as:
\(y = \alpha + f(x) + \epsilon\)
where \(f(x)\) just means some function of \(x\) and \(\epsilon\) indicates an error term.
Basis functions
GA(M)Ms models the non-linearity in the data by combining multiple basis functions. The number of basis functions is related to the smoothness/wiggliness of a given GA(M)Ms model, such that, for example, there is more room for a GA(M)Ms trajectory to be smoother if the number of basis functions is high. Smoothness/wiggliness of a GA(M)Ms model is determined by the smoothing parameters, which GAM automatically estimate based on the data, and the number of basis function sets an upper limit as to how smooth a model can be.
What you can adjust in your model is the number of knots. A knot is a converging point between each of the basis functions, and the number of knots corresponds to the number of basis functions + 1. In the plot below, I show ten basis functions based on cubic regression splines, which means that there are eleven knots that I specify in the GAM model.
There are a few types of basis functions that can be specified in the GAMMs model. Common ones include thin plate regression splines (tp) and cubic regression splines (cr). They look quite different from each other, but resulting final smooth curve ends up very similar.
Parametric and smooth terms
Let’s now go back again to our data set and see how a GAMMs model can be specified on R. For example, the model below is a GAM model (without a random effect) that captures time-varying non-linearity along the F2 dimension:
mgcv::bam(f2z ~ language + context + s(time) + s(time, by = language) + s(time, by = context), data = df_dyn_L)The model specification notation is somewhat similar to the one for
the lme4::lmer() convention for the linear models, so
hopefully this isn’t too complex for you.
As explained earlier, a GAM model can take two kinds of predictor
variables: parametric and smooth
terms. Parametric terms specify predictors that would
influence a constant, overall influence on the dependent variable. This
usually corresponds to the height of the trajectory. In
the model above, this corresponds to language and
context.
The new addition from the linear model would be smooth
terms which allow GA(M)Ms to fit non-linear effects as a
function of a variable. Smooth terms are notated as s().
This usually corresponds to the shape of the
trajectory. And in the model above, you can see a series of
s() terms, including s(time),
s(time, by = language) and
s(time, by = context). We will cover the specifics later,
but basically these instruct GAM to model the non-linearity seen
in F2 (dependent variable) as a function of time.
Let’s model F2 trajectory
Let’s try modelling a simple GAMMs model based on our data set. We
start with a simple GAM model to model the effects of
language and context on the F2 trajectory.
As usual, we’ll first subset the data and then convert the variables into factor, a similar procedure to the static analysis.
# subsetting data -- for /l/
df_dyn_L <- df_dyn |>
dplyr::filter(liquid == "L")
# language variable
df_dyn_L$language <- as.factor(df_dyn_L$language)
levels(df_dyn_L$language)## [1] "English" "Japanese"
## [1] "/æ/" "/i/" "/u/"
# speaker variable -- random effect
df_dyn_L$speaker <- as.factor(df_dyn_L$speaker)
levels(df_dyn_L$speaker)## [1] "2d57ke" "2drb3c" "2zy9tf" "3bcpyh" "3hsubn" "3pzrts" "4ps8zx" "54i2ks"
## [9] "5jzj2h" "5upwe3" "6p63jy" "7cd4t4" "9c4efu" "bfwizh" "birw55" "bj8mjm"
## [17] "byxcff" "c5y8z6" "cdsju7" "cu2jce" "dbtzn2" "dcxuft" "ds6umh" "f9japd"
## [25] "fgd95u" "fkcwjr" "h5s4x3" "heat7g" "hgrist" "i3wa7f" "jcy8xi" "kjn9n4"
## [33] "m46dhf" "m5r28t" "s6a8gh" "srky8g" "tay55n" "uig6n9" "ut4e5m" "we8z58"
## [41] "wrgwc3" "xub9bc" "z3n578" "zajk25" "zz3r2g"
## [1] "lamb" "lamp" "lap" "leaf" "leap" "leave" "loom" "lube"
A simple (linear) model only with parametric terms
What if we just model F2 only with parametric terms? Let’s just start with language variable.
# fit a model with parametric terms only
m1 <- mgcv::bam(f2z ~ language, data = df_dyn_L, method = "fREML")
# model summary
summary(m1)##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ language
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.04019 0.01355 2.965 0.00303 **
## languageJapanese -0.02048 0.01773 -1.155 0.24803
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## R-sq.(adj) = 2.5e-05 Deviance explained = 0.00996%
## -REML = 19166 Scale est. = 1.0225 n = 13398
The model output displays a section called Parametric
coefficients which shows how each of the paraemetric term has
an overall effect on the F2 values. But overall, the output looks pretty
much the same with the linear mixed-effect modelling using
lme4::lmer().
Adding non-linearity
Our first model m1 models the constant effect on the
language variable on the F2 values; that is, we could ask
whether L1 English and L1 Japanese speakers are overall different in F2
frequencies. But also, as we saw earlier in data visualisation, there
was a lot going on beyond the constant between-group difference.
Let’s extend the model so that we can model the non-linear
between-group difference. Here, we add a new smooth
term s(time) that allows us to model the non-linear
difference between L1 English and L1 Japanese speakers in the F2 values
over time.
# a model with a parametric and a smooth term
m2 <- mgcv::bam(f2z ~ language + s(time, by = language), data = df_dyn_L, method = "fREML")
# model summary
summary(m2)##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ language + s(time, by = language)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03474 0.01243 2.796 0.00519 **
## languageJapanese -0.01566 0.01625 -0.963 0.33549
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time):languageEnglish 6.700 7.834 284.86 <2e-16 ***
## s(time):languageJapanese 5.141 6.258 49.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.16 Deviance explained = 16%
## fREML = 18022 Scale est. = 0.8594 n = 13398
The top half of the model summary looks similar to the one for
m1. A new addition here is Approximate significance
of smooth terms; this is where we can evaluate the performance
of the smooth terms. There are two columns that you may not be familar
with, but edf would be the most important here.
edf stands for effective degree of freedom. This signifies how many parameters are needed to represent the smooth.
- Edf indexes the degree of non-linearity of the smooth. An edf value
closer to 1 means that the pattern modelled with
s()is linear.
- Edf indexes the degree of non-linearity of the smooth. An edf value
closer to 1 means that the pattern modelled with
Visualising GAM
Unlike linear models, interpretation of non-linear patterns is quite tricky and complicated. So, data visualiastion is crucial in the modelling of non-linear relationships. Let’s visualise the GAM model that we just modelled.
The package itsadug lets you visualise the F2 smooths
for each language group (L1 English and L1 Japanese speakers) as
follows:
## Summary:
## * language : factor; set to the value(s): English, Japanese.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * NOTE : No random effects in the model to cancel.
##
It is also possilbe to visualise the difference in F2 trajectories between the two speaker groups:
## Summary:
## * time : numeric predictor; with 100 values ranging from 0.000000 to 100.000000.
## * NOTE : No random effects in the model to cancel.
##
##
## time window(s) of significant difference(s):
## 0.000000 - 35.353535
## 41.414141 - 100.000000
We start to see some interesting bits here. The first plot suggests that, overall, L1 Japanese speakers (in light blue) show a flat F2 trajectory compared to L1 English speakers (in light pink). The difference between the two trajectories lie almost throughout the time course; the two trajectories cross over each other at approximately 40% time point, where we find little difference, but otherwise the confidence interval for the difference smooth is not overlapping zero consistently.
Model comparison
There are a couple of ways of statistical significance testing, but it is always recommended to do this via model comparison. The idea is similar to when we fit linear models, with only a few differences:
Instead of using
anova(), we useitsadug::compareML()function.Also, make sure that your GA(M)Ms model is run based on the Maximal Likelihood (ML) estimation method. This can be specified by the argument
method = "ML". Whereas restricted ML (REML) and fast restricted ML (fREML) are more efficient computationally, these cannot be used for model comparison.
Here, let’s compare whether an addition of the smooth term results in a better model fit.
# re-fitting the first model with the Maximal Likelihood estimation
m1 <- mgcv::bam(f2z ~ language, data = df_dyn_L, method = "ML")
# re-fitting the first model with the Maximal Likelihood estimation
m2 <- mgcv::bam(f2z ~ language + s(time, by = language), data = df_dyn_L, method = "ML")
# model comparison
## full description
itsadug::compareML(m1, m2, print.output = TRUE, suggest.report = TRUE)## m1: f2z ~ language
##
## m2: f2z ~ language + s(time, by = language)
##
## Report suggestion: The Chi-Square test on the ML scores indicates that model m2 is [significantly] better than model m1 (X2(4.00)=1146.117, p<2e-16).
## -----
## Model Score Edf Difference Df p.value Sig.
## 1 m1 19158.85 2
## 2 m2 18012.74 6 1146.117 4.000 < 2e-16 ***
##
## AIC difference: 2315.23, model m2 has lower AIC.
## Model Score Edf Difference Df p.value Sig.
## 1 m1 19158.85 2
## 2 m2 18012.74 6 1146.117 4.000 < 2e-16 ***
The output again looks similar to the case of linear model
comparisons using anova(). It says that m2 has
a lower Akaike Infomration Criterion (AIC) scores than m1.
It also shows that m2 has a lower (ML) score of 18012.74
compared to 19158.85 for m1. These two measures index the
degree of model fit, and the lower, the better. Overall, the model
comparison suggests that the model with a smooth m2
significantly improves the degree of model fit and is thus considered to
be a better model than the model with parametric terms only
m1.
Modelling difference smooths
As we are dealing with categorical predictors, it is possible to encode the non-linear difference into the model directly through difference smooths. Whereas we could stick to model two different smooths for each level in a predictor variable, we would ultimately know whether the two trajectories are statistically significantly different. The approach with difference smooth allows us to directly evaluate whether the between-group difference is significant or not.
To do this, we first need to convert the predictor variables into
ordered variables, a special case of factor variables.
Don’t forget to execute both as.ordered() and
"contr.tretment" as they are important in GAMMs.
# converting language variable into ordered variable
df_dyn_L$language.ord <- as.ordered(df_dyn_L$language)
contrasts(df_dyn_L$language.ord) <- "contr.treatment"When trying to model with difference smooths, you need to specify the following:
a parametric term of the variable
a smooth term of
timewithout any specificationa smooth over
timewith the grouping specificaionby = language.ord
So, the model would look like this:
# model
m3 <- mgcv::bam(f2z ~ language.ord + s(time) + s(time, by = language.ord), data = df_dyn_L, method = "fREML")
# summary
summary(m3)##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ language.ord + s(time) + s(time, by = language.ord)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03474 0.01243 2.795 0.00519 **
## language.ordJapanese -0.01565 0.01625 -0.963 0.33566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 7.121 8.055 276.6 <2e-16 ***
## s(time):language.ordJapanese 5.840 6.906 102.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.16 Deviance explained = 16%
## fREML = 18026 Scale est. = 0.85941 n = 13398
Let’s take a look at parametric coefficients first. It says that the
intercept is statistically significant, meaning that it’s different from
zero. language.ord.L does not show statistical
significance, which would mean that there is no overall effect of
language on the F2 values.
Let’s now move to the smooth terms. The edf values
are quite bigger than 1 for both smooth terms, indicating that they
capture non-linearity. s(time) is statistically
significant, meaning that time is a strong predictor of
f2z. In other words, time is not just linearly
related to f2z, thus indicating the non-linearity. Also,
the statistical significance of
s(time):language.ordJapanese means that the relationship
between time and f2z depends on the
language.ord variable. That is, L1 English and L1 Japanese
speakers show different non-linear trends in f2z over time
at statistically significant level.
Comparison between m2 and m3
Let’s compare the output of m3 with that of
m2 that we saw earlier. For m2, we fitted
separate smooths for each level in language variable, and
the model output only shows whether each smooth is significantly
different from zero. I’ve reiterated the model specification here for
convenience:
# a model with a parametric and a smooth term
m2 <- mgcv::bam(f2z ~ language + s(time, by = language), data = df_dyn_L, method = "fREML")
# model summary
summary(m2)##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ language + s(time, by = language)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03474 0.01243 2.796 0.00519 **
## languageJapanese -0.01566 0.01625 -0.963 0.33549
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time):languageEnglish 6.700 7.834 284.86 <2e-16 ***
## s(time):languageJapanese 5.141 6.258 49.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.16 Deviance explained = 16%
## fREML = 18022 Scale est. = 0.8594 n = 13398
Focussing on the smooth term table, it has two lines:
s(time):languageEnglish and
s(time):languageJapanese. Both show statistical
significance, meaning that f2 and time
interacts non-linearly for both L1 English and L1 Japanese speakers. But
this does not necessarily tell us whether the two groups are different.
This is why modelling with difference smooths can be
handy when the primary interest lies in comparing between-group
differences in trajectory pattern.
Adding context
So far, we have modelled the non-linear relationships between
time and f2z only with the
language variable. Let’s add the context
variable here as well.
Your turn
Sticking to the difference smooths approach, could
you follow the procedures below and fit a model with
language and context variables?
Converting the context variable into ordered
variables
Fitting the model
Please define a model with language.ord and
context.ord, incorporated into both parametric and smooth
terms, and save the model as m4.
# fitting a model with language.ord and context.ord
m4 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
# model summary
summary(m4)##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) +
## s(time, by = context.ord)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.487193 0.008799 -55.368 <2e-16 ***
## language.ordJapanese 0.022111 0.009026 2.450 0.0143 *
## context.ord/i/ 1.486657 0.010303 144.291 <2e-16 ***
## context.ord/u/ 0.007880 0.011298 0.698 0.4855
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 7.586 8.346 292.90 <2e-16 ***
## s(time):language.ordJapanese 6.969 7.949 274.16 <2e-16 ***
## s(time):context.ord/i/ 6.613 7.717 495.69 <2e-16 ***
## s(time):context.ord/u/ 1.000 1.001 80.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.741 Deviance explained = 74.2%
## -ML = 10135 Scale est. = 0.26444 n = 13398
Model comparison
The practice here allows us to proceed onto model comparison between
the full model with both language.ord and
context.ord, and one without either of these. Could you
come up with a good way to do this? Would you be able to say what
effects you are testing through model comparison?
You can re-use m3 and m4. In addition, you
might also want to specify m5. Don’t also forget about
changing the estimation method.
# model with language.ord
m3 <- mgcv::bam(f2z ~ language.ord + s(time) + s(time, by = language.ord), data = df_dyn_L, method = "ML")
# model with both language.ord and context.ord
m4 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
# model with context.ord
# m5 <- ...
# comparison
# ...Significance testing
We’re close to building the full model! Now, let’s focus on the
statistical significance of the predictor variables, such as
language.ord and context.ord.
In GAMMs, non-linear trajectories are captured through smooth terms, while parametric terms represent linear effects. The trajectories can be analyzed in terms of height (level) and shape (rate of change), which correspond to parametric and smooth terms, respectively.
Here’s how to assess the significance of each term:
1. Full model vs. nested model (excluding both parametric and smooth terms)
To determine the overall significance of a variable’s effect (both on height and shape):
- Compare the full model (which includes both the parametric and smooth terms for the variable of interest) with a nested model that excludes both the parametric and smooth terms associated with that variable.
Interpretation:
If the full model significantly improves the fit (via LRT, AIC, or WAIC), then the variable influences the dependent variable (either via an overall shift or a time-varying effect).
If there is no significant improvement, then the variable does not significantly impact the dependent variable.
2. Full model vs. nested model (including oarametric terms only for shape comparison)
If the full model was significantly better in Step 1, check whether the effect is shape-dependent by removing only the smooth term for that variable:
- Compare the full model (which includes both parametric and smooth terms for the variable) with another nested model that retains the parametric term but excludes the smooth term for the variable of interest.
Interpretation:
If the full model still fits significantly better, then the variable affects the trajectory shape (time-varying effect).
If the full model is not significantly better, then the variable’s effect is mainly static (height shift), not shape-dependent.
Let’s see how this works for the language.ord
effect:
Comparison 1: full model vs nested model without parametric/smooth terms
# full model
m4 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
# nested model 1 -- excluding both parametric and smooth terms
m4_1 <- mgcv::bam(f2z ~ context.ord + s(time) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
# model comparison
compareML(m4, m4_1, suggest.report = FALSE)## m4: f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) +
## s(time, by = context.ord)
##
## m4_1: f2z ~ context.ord + s(time) + s(time, by = context.ord)
##
## Chi-square test of ML scores
## -----
## Model Score Edf Difference Df p.value Sig.
## 1 m4_1 11133.99 9
## 2 m4 10135.37 12 998.615 3.000 < 2e-16 ***
##
## AIC difference: -2018.32, model m4 has lower AIC.
The model comparison suggests that the full model m4
significantly improves the model fit. Let’s now move onto another model
comparison.
Comparison 2: full model vs nested model only with parametric term
# full model
m4 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
# nested model 2 -- including parametric term only
m4_2 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
# model comparison
compareML(m4, m4_2, suggest.report = FALSE)## m4: f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) +
## s(time, by = context.ord)
##
## m4_2: f2z ~ language.ord + context.ord + s(time) + s(time, by = context.ord)
##
## Chi-square test of ML scores
## -----
## Model Score Edf Difference Df p.value Sig.
## 1 m4_2 11132.36 10
## 2 m4 10135.37 12 996.989 2.000 < 2e-16 ***
##
## AIC difference: -2017.05, model m4 has lower AIC.
The full model still improves the degree of model fit. This suggests
that language.ord has a statistically significant
difference on both height and shape
dimensions, but specifically with a notable shape
difference.
Your turn
How about the context.ord?
Comparison 1: full model vs nested model without parametric/smooth terms
# full model
m4 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
# nested model 1 -- excluding both parametric and smooth terms
# m4_3 <- ...
# model comparison
# compareML(m4, m4_3, suggest.report = FALSE)Comparison 2: full model vs nested model only with parametric term
Please write a code chunk if another model comparison is still necessary.
Random effects
Factor smooths
You might notice that we have modelled non-linear change of
f2z over time with fixed effects only, and we have ignored
by-speaker and by-item random effects. This is where the difference
between GAM and GAMM comes in: Generalised Additive Mixed Effect Models,
as the name suggests, can also incorporate random effects!
There are various ways to incorporate random effects, but here, we stick to one of them, which is based on factor smooths.
Factor smooths fit curves for indiviaul participants (if specified
for speaker, participant etc) and for
individual items (if specified for word, item
etc). You could think of this as a non-linear equivalent of random
intercepts and random slopes in the linear mixed-effect models.
Let’s start with accounting for the by-speaker factor smooths. In the
model below, a new smooth term is added:
s(time, speaker, bs = "fs", m = 1). This tells the model to
fit non-linear trajectories for f2z over time
for each of the speaker(s). bs = "fs"
specifies factor smooths, and the final m = 1 is a control
parameter of the degree of wiggliness of by-speaker trajectory – we
leave it as it is.
# full model with by-speaker random effects
m5 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord) + s(time, speaker, bs = "fs", m = 1), data = df_dyn_L, method = "fREML")## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) +
## s(time, by = context.ord) + s(time, speaker, bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.496646 0.036501 -13.606 <2e-16 ***
## language.ordJapanese 0.030994 0.043759 0.708 0.4788
## context.ord/i/ 1.515216 0.009583 158.119 <2e-16 ***
## context.ord/u/ 0.022558 0.010410 2.167 0.0302 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 7.378 7.926 43.520 <2e-16 ***
## s(time):language.ordJapanese 6.580 7.278 27.036 <2e-16 ***
## s(time):context.ord/i/ 6.781 7.853 494.595 <2e-16 ***
## s(time):context.ord/u/ 4.206 5.164 27.335 <2e-16 ***
## s(time,speaker) 235.149 403.000 6.374 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.783 Deviance explained = 78.8%
## fREML = 9234.8 Scale est. = 0.22161 n = 13398
Compared to the model outputs earlier, there is one extra line in the
smooth term section. s(time,speaker) at the bottom
indicates the smooth term to account for by-speaker variation. This
shows statistical significance, meaning that this is good to be
included.
And this is what by-speaker factor smooths looks like:
Confidence intervals
Adding random effects result in wider confidence intervals as a result of allowing variability arising from random effects. In other words, models without random effects have narrower confidence intervals than is necessary, which might lead to the Type I error (i.e., detecting statistical significance when there shouldn’t be).
Here is a comparison of separate smooths between m4 and
m5.
# m4: model without factor smooths
# m4 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord), data = df_dyn_L, method = "ML")
itsadug::plot_smooth(m4, view = "time", plot_all = "language.ord", cond = list(context.ord = "/æ/"))## Summary:
## * language.ord : factor; set to the value(s): English, Japanese.
## * context.ord : factor; set to the value(s): /æ/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * NOTE : No random effects in the model to cancel.
##
# m5: model with factor smooths
# m5 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord) + s(time, speaker, bs = "fs", m = 1), data = df_dyn_L, method = "fREML")
itsadug::plot_smooth(m5, view = "time", plot_all = "language.ord", cond = list(context.ord = "/æ/"))## Summary:
## * language.ord : factor; set to the value(s): English, Japanese.
## * context.ord : factor; set to the value(s): /æ/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * speaker : factor; set to the value(s): ds6umh. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(time,speaker)
##
It seems that the confidence interval also becomes wider in
difference smooths for m5 that includes the factor
smooths:
### difference smooth
itsadug::plot_diff(m4, view = "time", comp = list(language.ord = c("Japanese", "English")), cond = list(context.ord = "/æ/"), print.summary = FALSE, main = "Japanese-English difference in the /æ/ context")itsadug::plot_diff(m5, view = "time", comp = list(language.ord = c("Japanese", "English")), cond = list(context.ord = "/æ/"), print.summary = FALSE, main = "Japanese-English difference in the /æ/ context")Your turn
In a similar manner to the by-speaker random effects, could you write
a code of a model containing both by-speaker and
by-item (operationalised here as word)
random effects?
Also, how do the by-item factor smooths look?
Correcting autocorrelation
Looks like we’re pretty much all set – well done on getting this far! However, there is one more important aspect to consider, and we’ll need to look into the assumptions under which GAMMs operate.
If you remember from the linear models, I explained that statistcal tests have assumptions that need to be met. And it concerned with residuals. Residuals mean the difference between observed and fitted values: in linear models, casually speaking, we should not expect any visible structure in the residual distributions under the normality and constant assumption.
Something similar can be said here: residuals need to be independent from each other for GAMMs models, too. In a simpler term, the model does not know (yet) whether data points are completely independent of each other or sampled from a series of time series data. This has an important consequence in estimation accuracy, so we need to address this.
The solution is to incorporate AR(1) model into your GAMMs model. Through adding AR(1) model, we define:
an extra parameter
rho(\(\rho\))an extra column in the data set telling where a trajectory starts (
start.event)
Let’s go through the process to build and incorporate the AR(1) model.
Check autocorrelation
Let’s check the residual structure in our full model m6.
This can be done via itsadug::acf_resid() function.
The plot shows residual autocorrelation at different
lags. For example, the autocorrelation at lag 0 (i.e.,
“0” on the x-axis) shows the value of 1, and this makes sense because
lag 0 shows the correlation of the residuals with themselves. The
residual correlation at Lag 1 is calculated by taking
the f2z values at the time 0, 1, 2, … and correlating them
with the f2z values at time 1, 2, 3 … . Similarly,
lag 2 is the correlation between correlations of the
f2z values at the time 0, 1, 2 … and at the time 2, 3, 4 …
.
This is somewhat complicated, but the important thing to look for in the plot is the fact that the heights of the two vertical lines next to each other are correlated. This suggests that residuals are correlated with each other and this is not ideal for accurate modelling estimation.
Defining rho (\(\rho\))
In order to address the residual autocorrelation, we will first
define a rho value. It is advisable that you explore a wide
range of rho values, but a rule of thumb is to define the
rho value as the autocorrelation at lag 1.
The autocorrelation at lag 1 is approximately 0.783, and we use this as a rho value in the model.
Defining the start of an event
Another thing we need to do is to define the beginning of each
trajectory. This simply marks the row in the data corresponding to
time = 0 as TRUE.
Note: When doing this, make sure that your data is
ordered appropriately. In our case here, the time sequence
needs to be arranged from 0, 10, 20, … 100 for each token.
# let's arrange the data first
df_dyn_L <- df_dyn_L |>
dplyr::arrange(
speaker, file, time
) |>
dplyr::relocate(time) # to move the time column forward
# define the event onset
df_dyn_L$start.event <- df_dyn_L$time == 0
# check data
df_dyn_L |>
dplyr::select(speaker, file, time, f2z, start.event)## # A tibble: 13,398 × 5
## speaker file time f2z[,1] start.event
## <fct> <chr> <dbl> <dbl> <lgl>
## 1 2d57ke JP_2d57ke_lamb033_0001 0 -0.701 TRUE
## 2 2d57ke JP_2d57ke_lamb033_0001 10 -0.679 FALSE
## 3 2d57ke JP_2d57ke_lamb033_0001 20 -0.630 FALSE
## 4 2d57ke JP_2d57ke_lamb033_0001 30 -0.679 FALSE
## 5 2d57ke JP_2d57ke_lamb033_0001 40 -0.668 FALSE
## 6 2d57ke JP_2d57ke_lamb033_0001 50 -0.648 FALSE
## 7 2d57ke JP_2d57ke_lamb033_0001 60 -0.632 FALSE
## 8 2d57ke JP_2d57ke_lamb033_0001 70 -0.624 FALSE
## 9 2d57ke JP_2d57ke_lamb033_0001 80 -0.668 FALSE
## 10 2d57ke JP_2d57ke_lamb033_0001 90 -0.948 FALSE
## # ℹ 13,388 more rows
As you can see above, the data are arranged according to
time within each token, and the onset of a trajectory
(i.e., where time = 0) is marked TRUE in the
start.event column.
Fitting the full model with AR(1) model
Finally, let’s incorporate the residual autocorrelation correction (i.e., AR(1) model) into our model.
# full model with AR(1) model
m7 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "fREML", rho = rho_m6, AR.start = df_dyn_L$start.event)## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) +
## s(time, by = context.ord) + s(time, speaker, bs = "fs", m = 1) +
## s(time, word, bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.49581 0.04580 -10.825 <2e-16 ***
## language.ordJapanese 0.03034 0.04054 0.749 0.454
## context.ord/i/ 1.51032 0.04647 32.499 <2e-16 ***
## context.ord/u/ 0.02278 0.05174 0.440 0.660
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 7.241 7.396 19.887 <2e-16 ***
## s(time):language.ordJapanese 7.566 7.842 28.637 <2e-16 ***
## s(time):context.ord/i/ 6.592 6.792 16.342 <2e-16 ***
## s(time):context.ord/u/ 1.000 1.001 0.542 0.462
## s(time,speaker) 312.889 403.000 5.498 <2e-16 ***
## s(time,word) 52.113 69.000 15.366 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.796 Deviance explained = 80.2%
## fREML = 1118.5 Scale est. = 0.15231 n = 13398
Let’s also check whether autocorrelation is accounted for.
The autocorrelation plot shows that the autocorrelation at lag 1
(i.e., second vertical line from the left) is substantially samller than
in the previous plot. This means that the new model m7
successfully recognises the dependencies between data points and thus
the residual autocorrelation has been eliminated.
Model criticism
We have touched upon some assumptions just now, and it’s a good practice to check whether our model satisfies those assumptions.
##
## Method: fREML Optimizer: perf newton
## full convergence after 17 iterations.
## Gradient range [-3.156672e-05,4.419978e-05]
## (score 1118.485 & scale 0.1523108).
## Hessian positive definite, eigenvalue range [3.155692e-05,6698.08].
## Model rank = 517 / 517
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 9.00 7.24 1 0.59
## s(time):language.ordJapanese 9.00 7.57 1 0.59
## s(time):context.ord/i/ 9.00 6.59 1 0.57
## s(time):context.ord/u/ 9.00 1.00 1 0.55
## s(time,speaker) 405.00 312.89 1 0.52
## s(time,word) 72.00 52.11 1 0.55
The plots here look overall OK, but not ideal. The tails on both ends of the ‘spaghetti-like’ line in the first plot suggests that the residual distribution does not strictly follow the normal distribution: Rather, it suggests that it follows a t-distribution. The histogram also shows somewhat skewed distribution.
We won’t try correcting these issues in the interest of time, but
existing GAMMs tutorials suggest some ways to address this. One way, for
example, is to tell your GAMMs model to follow a scaled t
distribution by adding family = 'scat' argument.
But for now, let’s leave it as it is.
Final model: data visualisation
We have covered quite a few things so far and we have not managed to visualise the full model. Let’s do this finally.
# full model with AR(1) model
m7 <- mgcv::bam(f2z ~ language.ord + context.ord + s(time) + s(time, by = language.ord) + s(time, by = context.ord) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "fREML", rho = rho_m6, AR.start = df_dyn_L$start.event)## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
# visualisation
## /æ/ context
### separate smooths
plot_smooth(m7, view = "time", plot_all = "language.ord", cond = list(context.ord = "/æ/"))## Summary:
## * language.ord : factor; set to the value(s): English, Japanese.
## * context.ord : factor; set to the value(s): /æ/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * speaker : factor; set to the value(s): ds6umh. (Might be canceled as random effect, check below.)
## * word : factor; set to the value(s): lamb. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(time,speaker),s(time,word)
##
### difference smooth
plot_diff(m7, view = "time", comp = list(language.ord = c("Japanese", "English")), cond = list(context.ord = "/æ/"), print.summary = FALSE, main = "Japanese-English difference in the /æ/ context")Your turn
Could you write a code chunk to visualise (1) separate and (2) difference smooths between L1 English and L1 Japanese speakers in the /i/ and /u/ contexts?
Interaction
Finally, let’s briefly think about how to model interactions between
language and context. Modelling interactions
in GAMMs models is highly complex, and I did not try this in my
publication. I still do not know how to perform significance testing via
model comparison, we could at least try modelling and inspecting the
model summary.
This section can also showcase as a complete analysis workflow based on things that we’ve covered so far.
Creating the LangCont variable
One way of modelling the categorical \(\times\) categorical interaction in GAMMs
is to concatenate two variables into one. This can be done via
interaction function.
# convert variables into factor
df_dyn_L$language <- as.factor(df_dyn_L$language)
df_dyn_L$context <- as.factor(df_dyn_L$context)
df_dyn_L$speaker <- as.factor(df_dyn_L$speaker)
df_dyn_L$word <- as.factor(df_dyn_L$word)
# create an interaction variable
df_dyn_L$LangCont <- interaction(df_dyn_L$language, df_dyn_L$context)To quickly reiterate, language has two levels (Japanese
vs English) and context has three levels (/æ/ vs /i/ vs
/u/). By combining them together, LangCont can fit a
trajectory for each of the six levels (i.e., two for
language \(\times\) three
for context):
# checking the levels in the LangCont column
df_dyn_L |>
dplyr::group_by(LangCont) |>
dplyr::summarise() |>
dplyr::ungroup()## # A tibble: 6 × 1
## LangCont
## <fct>
## 1 English./æ/
## 2 Japanese./æ/
## 3 English./i/
## 4 Japanese./i/
## 5 English./u/
## 6 Japanese./u/
This way, we can model the interaction while treating
LangCont as just one variable in the model.
Fitting the first model
We’ll stick to the random smooths approach, so let’s convert
`LangCont into an ordered variable.
# converting LangCont into ordered variable
df_dyn_L$LangCont.ord <- as.ordered(df_dyn_L$LangCont)
contrasts(df_dyn_L$LangCont.ord) <- "contr.treatment" # don't forget this!
# fitting a model
m10 <- mgcv::bam(f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "fREML")## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time,
## speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.29612 0.05487 -5.397 6.89e-08 ***
## LangCont.ordJapanese./æ/ -0.29689 0.05076 -5.849 5.07e-09 ***
## LangCont.ordEnglish./i/ 1.11191 0.05163 21.534 < 2e-16 ***
## LangCont.ordJapanese./i/ 1.51342 0.07141 21.194 < 2e-16 ***
## LangCont.ordEnglish./u/ -0.21483 0.05763 -3.728 0.000194 ***
## LangCont.ordJapanese./u/ -0.10310 0.07578 -1.361 0.173657
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 7.139 7.477 23.657 < 2e-16 ***
## s(time):LangCont.ordJapanese./æ/ 6.604 7.466 47.542 < 2e-16 ***
## s(time):LangCont.ordEnglish./i/ 4.420 5.018 8.549 < 2e-16 ***
## s(time):LangCont.ordJapanese./i/ 5.531 6.249 3.588 0.001264 **
## s(time):LangCont.ordEnglish./u/ 1.000 1.000 11.538 0.000684 ***
## s(time):LangCont.ordJapanese./u/ 5.729 6.669 21.049 < 2e-16 ***
## s(time,speaker) 251.152 403.000 8.546 < 2e-16 ***
## s(time,word) 48.151 69.000 17.876 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.831 Deviance explained = 83.5%
## fREML = 7680.8 Scale est. = 0.17263 n = 13398
Residual autocorrelation corrections
We also need to build AR(1) model to account for the residual autocorrelations.
# mark start.event
## arrange the data appropriately
df_dyn_L <- df_dyn_L |>
dplyr::arrange(file, time)
## mark time = 0
df_dyn_L$start.event <- df_dyn_L$time == 0
# check residual autocorrelations
itsadug::acf_resid(m10)Fitting model again
# fitting a model
m11 <- mgcv::bam(f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "fREML", rho = rho_m10, AR.start = df_dyn_L$start.event)## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time,
## speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.29532 0.05184 -5.697 1.24e-08 ***
## LangCont.ordJapanese./æ/ -0.29656 0.04928 -6.018 1.81e-09 ***
## LangCont.ordEnglish./i/ 1.11110 0.05275 21.062 < 2e-16 ***
## LangCont.ordJapanese./i/ 1.50725 0.06817 22.111 < 2e-16 ***
## LangCont.ordEnglish./u/ -0.21568 0.05860 -3.681 0.000234 ***
## LangCont.ordJapanese./u/ -0.10876 0.07250 -1.500 0.133603
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 7.573 7.684 21.480 < 2e-16 ***
## s(time):LangCont.ordJapanese./æ/ 7.770 8.207 46.032 < 2e-16 ***
## s(time):LangCont.ordEnglish./i/ 5.015 5.458 7.728 1.41e-06 ***
## s(time):LangCont.ordJapanese./i/ 7.151 7.530 4.539 3.62e-05 ***
## s(time):LangCont.ordEnglish./u/ 4.646 5.108 3.095 0.0079 **
## s(time):LangCont.ordJapanese./u/ 6.802 7.287 8.889 < 2e-16 ***
## s(time,speaker) 317.927 403.000 6.102 < 2e-16 ***
## s(time,word) 51.634 69.000 16.127 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.83 Deviance explained = 83.5%
## fREML = 770.17 Scale est. = 0.13281 n = 13398
Looks like all smooth and parametric terms are quite important! Let’s quickly do the model comparison for significant testing.
# full model with ML
m11 <- mgcv::bam(f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "ML", rho = rho_m10, AR.start = df_dyn_L$start.event)## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
# nested model with ML
m11_2 <- mgcv::bam(f2z ~ s(time) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "ML", rho = rho_m10, AR.start = df_dyn_L$start.event)## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
## m11: f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time,
## speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1)
##
## m11_2: f2z ~ s(time) + s(time, speaker, bs = "fs", m = 1) + s(time,
## word, bs = "fs", m = 1)
##
## Report suggestion: The Chi-Square test on the ML scores indicates that model m11 is [significantly?] better than model m11_2 (X2(15.00)=578.837, p<2e-16).
## -----
## Model Score Edf Difference Df p.value Sig.
## 1 m11_2 1326.5758 7
## 2 m11 747.7391 22 578.837 15.000 < 2e-16 ***
##
## AIC difference: -989.36, model m11 has lower AIC.
The full model significantly improves the model fit. Let’s further investigate whether this is to do with shape difference.
# full model with ML
m11 <- mgcv::bam(f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "ML", rho = rho_m10, AR.start = df_dyn_L$start.event)## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
# nested model with ML 2 -- parametric term only
m11_3 <- mgcv::bam(f2z ~ LangCont.ord + s(time) + s(time, speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1), data = df_dyn_L, method = "ML", rho = rho_m10, AR.start = df_dyn_L$start.event)## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
## m11: f2z ~ LangCont.ord + s(time) + s(time, by = LangCont.ord) + s(time,
## speaker, bs = "fs", m = 1) + s(time, word, bs = "fs", m = 1)
##
## m11_3: f2z ~ LangCont.ord + s(time) + s(time, speaker, bs = "fs", m = 1) +
## s(time, word, bs = "fs", m = 1)
##
## Report suggestion: The Chi-Square test on the ML scores indicates that model m11 is [significantly?] better than model m11_3 (X2(10.00)=407.254, p<2e-16).
## -----
## Model Score Edf Difference Df p.value Sig.
## 1 m11_3 1154.9934 12
## 2 m11 747.7391 22 407.254 10.000 < 2e-16 ***
##
## AIC difference: -677.36, model m11 has lower AIC.
The full model still improves the degree of model fit at the
statistically significant level. This overall suggests that the
interaction between language and context is
statistically significant on trajectory shape.
Data visualisation
separate smooths using itsadug
Let’s check if the PC1 trajectories are modelled accurately. We’ll first check the separate smooths for each liquid segment, which seems to suggest that the modelling looks good.
## # A tibble: 6 × 1
## LangCont.ord
## <ord>
## 1 English./æ/
## 2 Japanese./æ/
## 3 English./i/
## 4 Japanese./i/
## 5 English./u/
## 6 Japanese./u/
# separate smooths 1
## English
plot_smooth(m11, view = "time", plot_all = "LangCont.ord", cond = list(LangCont.ord = c("English./i/", "English./æ/", "English./u/")))## Summary:
## * LangCont.ord : factor; set to the value(s): English./æ/, English./i/, English./u/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * speaker : factor; set to the value(s): ds6umh. (Might be canceled as random effect, check below.)
## * word : factor; set to the value(s): lamb. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(time,speaker),s(time,word)
##
## Japanese
plot_smooth(m11, view = "time", plot_all = "LangCont.ord", cond = list(LangCont.ord = c("Japanese./i/", "Japanese./æ/", "Japanese./u/")))## Summary:
## * LangCont.ord : factor; set to the value(s): Japanese./æ/, Japanese./i/, Japanese./u/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * speaker : factor; set to the value(s): ds6umh. (Might be canceled as random effect, check below.)
## * word : factor; set to the value(s): lamb. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(time,speaker),s(time,word)
##
separate and difference smooths using itsadug
Now let’s check how L1 English and L1 Japanese speakers differ in the patterns of their PC1 trajectoties. We’ll visualise separate by-group smooths for each liquid segment first, followed by difference smooths encoding the by-group differences.
# /i/
## separate smooths
plot_smooth(m11, view = "time", plot_all = "LangCont.ord", cond = list(LangCont.ord = c("English./i/", "Japanese./i/")))## Summary:
## * LangCont.ord : factor; set to the value(s): English./i/, Japanese./i/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * speaker : factor; set to the value(s): ds6umh. (Might be canceled as random effect, check below.)
## * word : factor; set to the value(s): lamb. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(time,speaker),s(time,word)
##
## difference smooth
plot_diff(m11, view = "time", comp = list(LangCont.ord = c("English./i/", "Japanese./i/")), print.summary = FALSE, main = "Japanese-English difference in the /i/ context")# /æ/
## separate smooths
plot_smooth(m11, view = "time", plot_all = "LangCont.ord", cond = list(LangCont.ord = c("English./æ/", "Japanese./æ/")))## Summary:
## * LangCont.ord : factor; set to the value(s): English./æ/, Japanese./æ/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * speaker : factor; set to the value(s): ds6umh. (Might be canceled as random effect, check below.)
## * word : factor; set to the value(s): lamb. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(time,speaker),s(time,word)
##
## difference smooth
plot_diff(m11, view = "time", comp = list(LangCont.ord = c("English./æ/", "Japanese./æ/")), print.summary = FALSE, main = "Japanese-English difference in the /æ/ context")# /u/
## separate smooths
plot_smooth(m11, view = "time", plot_all = "LangCont.ord", cond = list(LangCont.ord = c("English./u/", "Japanese./u/")))## Summary:
## * LangCont.ord : factor; set to the value(s): English./u/, Japanese./u/.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 100.000000.
## * speaker : factor; set to the value(s): ds6umh. (Might be canceled as random effect, check below.)
## * word : factor; set to the value(s): lamb. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(time,speaker),s(time,word)
##
## difference smooth
plot_diff(m11, view = "time", comp = list(LangCont.ord = c("English./u/", "Japanese./u/")), print.summary = FALSE, main = "Japanese-English difference in the /æ/ context")Summary
We’ve covered so many things! Here is a brief summary of what has been covered today:
Static data analysis does not always highlight between-group differences accurately.
As our comparison of static and dynamic data visualisation suggests, it is important to think how the data points are sampled for static analysis. It might be the case that two groups happen to be similar when two time-varying contours just cross over.
Generalised Additive Mixed Modelling can model non-linear between-group difference over time
By combining parametric and smooth terms, GAMMs can capture non-linear between-group difference in a response variable over time. We also learnt a modelling approach using difference smooths that can encode the between-group difference directly into the model.
Always visualise the data!
While we have tried interpretting the model summary a few times, it is still quite complex. This is why data visualisation is really important in non-linear modelling like GAMMs.
Wrap-up question
How did you find GAMMs so far? What types of data/research questions might be suitable to be tested through GAMMs?
References
The GAMMs section is heavily drawn from the great existing tutorials and empirical papers. All of these provide great details in GAMMs modelling and I would highly recommend all of these if you would like to know more about GAMMs.
Chenzi, X. (2024). Cross-dialectal perspectives on Mandarin neutral tone. Journal of Phonetics, 106, 101341. https://doi.org/10.1016/j.wocn.2024.101341
Chuang, Y.-Y., Fon, J., Papakyritsis, I., Baayen, H., & Ball, M. J. (2021). Analyzing Phonetic Data with Generalized Additive Mixed Models. In M. Ball (Ed.), Manual of Clinical Phonetics (1st ed., Vol. 1, pp. 108–138). Routledge. https://doi.org/10.4324/9780429320903-10
Coretta, S. (2024). Learn Generalised Additive (Mixed) Models. Online tutorial. https://stefanocoretta.github.io/learnGAM/
Kirkham, S, Nance, C., Littlewood, B., Lightfoot, K., & Groarke, E. (2019). Dialect variation in formant dynamics: The acoustics of lateral and vowel sequences in Manchester and Liverpool English. The Journal of Acoustical Society of America, 145(2). 784–794. https://doi.org/10.1121/1.5089886
Sóskuthy, M. (2017). Generalised Additive Mixed Models for dynamic analysis in linguistics: a practical introduction. arXiv:1703.05339 [stat:AP].
Sóskuthy, M., Foulkes, P., Hughes, V., & Haddican, B. (2018). Changing words and sounds: the roles of different cognitive units in sound change. Topics in Cognitive Science, 10, 787–802.
Sóskuthy, M. (2021). Evaluating generalised additive mixed modelling strategies for dynamic speech analysis. Journal of Phonetics, 84, 101017. https://doi.org/10.1016/j.wocn.2020.101017
Steffman, J. (2022). Modeling intonational tunes with Generalized Additive Mixed Models: A practical introduction. Online tutorial. https://jsteffman.github.io/GAMM_intonation_modeling.html
Wieling, M. (2018). Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics, 70, 86–116. https://doi.org/10.1016/j.wocn.2018.03.002